Web Scraping with Python: Collecting More Data from the Modern Web by Ryan Mitchell

Web Scraping with Python: Collecting More Data from the Modern Web by Ryan Mitchell

Author:Ryan Mitchell [Mitchell, Ryan]
Language: eng
Format: azw3
Publisher: O'Reilly Media
Published: 2018-03-21T04:00:00+00:00


1

Although many of the techniques described in this chapter can be applied to all or most languages, it’s okay for now to focus on natural language processing in English only. Tools such as Python’s Natural Language Toolkit, for example, focus on English. Fifty-six percent of the internet is still in English (with German following at a mere 6%, according to W3Techs). But who knows? English’s hold on the majority of the internet will almost certainly change in the future, and further updates may be necessary in the next few years.

2

Oriol Vinyals et al, “A Picture Is Worth a Thousand (Coherent) Words: Building a Natural Description of Images”, Google Research Blog, November 17, 2014.

3

The exception is the last word in the text, because nothing follows the last word. In our example text, the last word is a period (.), which is convenient because it has 215 other occurrences in the text and so does not represent a dead-end. However, in real-world implementations of the Markov generator, the last word of the text might be something you need to account for.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.